Information extraction from scholarly articles is a challenging task due to the sizable document length and implicit information hidden in text, figures, and citations. Scholarly information extraction has various applications in exploration, archival, and curation services for digital libraries and knowledge management systems. We present MORTY, an information extraction technique that creates structured summaries of text from scholarly articles. Our approach condenses the article's full-text to property-value pairs as a segmented text snippet called structured summary. We also present a sizable scholarly dataset combining structured summaries retrieved from a scholarly knowledge graph and corresponding publicly available scientific articles, which we openly publish as a resource for the research community. Our results show that structured summarization is a suitable approach for targeted information extraction that complements other commonly used methods such as question answering and named entity recognition.
translated by 谷歌翻译
学术知识图(KGS)提供了代表科学出版物编码的知识的丰富的结构化信息来源。随着出版的科学文学的庞大,包括描述科学概念的过多的非均匀实体和关系,这些公斤本质上是不完整的。我们呈现Exbert,一种利用预先训练的变压器语言模型来执行学术知识图形完成的方法。我们将知识图形的三元组模型为文本并执行三重分类(即,属于KG或不属于KG)。评估表明,在三重分类,链路预测和关系预测的任务中,Exbert在三个学术kg完成数据集中表现出其他基线。此外,我们将两个学术数据集作为研究界的资源,从公共公共公报和在线资源中收集。
translated by 谷歌翻译
Vehicle-to-Everything (V2X) communication has been proposed as a potential solution to improve the robustness and safety of autonomous vehicles by improving coordination and removing the barrier of non-line-of-sight sensing. Cooperative Vehicle Safety (CVS) applications are tightly dependent on the reliability of the underneath data system, which can suffer from loss of information due to the inherent issues of their different components, such as sensors failures or the poor performance of V2X technologies under dense communication channel load. Particularly, information loss affects the target classification module and, subsequently, the safety application performance. To enable reliable and robust CVS systems that mitigate the effect of information loss, we proposed a Context-Aware Target Classification (CA-TC) module coupled with a hybrid learning-based predictive modeling technique for CVS systems. The CA-TC consists of two modules: A Context-Aware Map (CAM), and a Hybrid Gaussian Process (HGP) prediction system. Consequently, the vehicle safety applications use the information from the CA-TC, making them more robust and reliable. The CAM leverages vehicles path history, road geometry, tracking, and prediction; and the HGP is utilized to provide accurate vehicles' trajectory predictions to compensate for data loss (due to communication congestion) or sensor measurements' inaccuracies. Based on offline real-world data, we learn a finite bank of driver models that represent the joint dynamics of the vehicle and the drivers' behavior. We combine offline training and online model updates with on-the-fly forecasting to account for new possible driver behaviors. Finally, our framework is validated using simulation and realistic driving scenarios to confirm its potential in enhancing the robustness and reliability of CVS systems.
translated by 谷歌翻译
Climate change is expected to intensify and increase extreme events in the weather cycle. Since this has a significant impact on various sectors of our life, recent works are concerned with identifying and predicting such extreme events from Earth observations. This paper proposes a 2D/3D two-branch convolutional neural network (CNN) for wildfire danger forecasting. To use a unified framework, previous approaches duplicate static variables along the time dimension and neglect the intrinsic differences between static and dynamic variables. Furthermore, most existing multi-branch architectures lose the interconnections between the branches during the feature learning stage. To address these issues, we propose a two-branch architecture with a Location-aware Adaptive Denormalization layer (LOADE). Using LOADE as a building block, we can modulate the dynamic features conditional on their geographical location. Thus, our approach considers feature properties as a unified yet compound 2D/3D model. Besides, we propose using an absolute temporal encoding for time-related forecasting problems. Our experimental results show a better performance of our approach than other baselines on the challenging FireCube dataset.
translated by 谷歌翻译
Prostate cancer (PCa) is one of the most prevalent cancers in men and many people around the world die from clinically significant PCa (csPCa). Early diagnosis of csPCa in bi-parametric MRI (bpMRI), which is non-invasive, cost-effective, and more efficient compared to multiparametric MRI (mpMRI), can contribute to precision care for PCa. The rapid rise in artificial intelligence (AI) algorithms are enabling unprecedented improvements in providing decision support systems that can aid in csPCa diagnosis and understanding. However, existing state of the art AI algorithms which are based on deep learning technology are often limited to 2D images that fails to capture inter-slice correlations in 3D volumetric images. The use of 3D convolutional neural networks (CNNs) partly overcomes this limitation, but it does not adapt to the anisotropy of images, resulting in sub-optimal semantic representation and poor generalization. Furthermore, due to the limitation of the amount of labelled data of bpMRI and the difficulty of labelling, existing CNNs are built on relatively small datasets, leading to a poor performance. To address the limitations identified above, we propose a new Zonal-aware Self-supervised Mesh Network (Z-SSMNet) that adaptatively fuses multiple 2D, 2.5D and 3D CNNs to effectively balance representation for sparse inter-slice information and dense intra-slice information in bpMRI. A self-supervised learning (SSL) technique is further introduced to pre-train our network using unlabelled data to learn the generalizable image features. Furthermore, we constrained our network to understand the zonal specific domain knowledge to improve the diagnosis precision of csPCa. Experiments on the PI-CAI Challenge dataset demonstrate our proposed method achieves better performance for csPCa detection and diagnosis in bpMRI.
translated by 谷歌翻译
Measuring growth rates of apple fruitlets is important because it allows apple growers to determine when to apply chemical thinners to their crops to optimize yield. The current practice of obtaining growth rates involves using calipers to record sizes of fruitlets across multiple days. Due to the number of fruitlets needed to be sized, this method is laborious, time-consuming, and prone to human error. In this paper, we present a computer vision approach to measure the sizes and growth rates of apple fruitlets. With images collected by a hand-held stereo camera, our system detects, segments, and fits ellipses to fruitlets to measure their diameters. To measure growth rates, we utilize an Attentional Graph Neural Network to associate fruitlets across different days. We provide quantitative results on data collected in an apple orchard, and demonstrate that our system is able to predict abscise rates within 3% of the current method with a 7 times improvement in speed, while requiring significantly less manual effort. Moreover, we provide results on images captured by a robotic system in the field, and discuss the next steps to make the process fully autonomous.
translated by 谷歌翻译
Continuous behavioural authentication methods add a unique layer of security by allowing individuals to verify their unique identity when accessing a device. Maintaining session authenticity is now feasible by monitoring users' behaviour while interacting with a mobile or Internet of Things (IoT) device, making credential theft and session hijacking ineffective. Such a technique is made possible by integrating the power of artificial intelligence and Machine Learning (ML). Most of the literature focuses on training machine learning for the user by transmitting their data to an external server, subject to private user data exposure to threats. In this paper, we propose a novel Federated Learning (FL) approach that protects the anonymity of user data and maintains the security of his data. We present a warmup approach that provides a significant accuracy increase. In addition, we leverage the transfer learning technique based on feature extraction to boost the models' performance. Our extensive experiments based on four datasets: MNIST, FEMNIST, CIFAR-10 and UMDAA-02-FD, show a significant increase in user authentication accuracy while maintaining user privacy and data security.
translated by 谷歌翻译
Mixture of factor analyzer (MFA) model is an efficient model for the analysis of high dimensional data through which the factor-analyzer technique based on the covariance matrices reducing the number of free parameters. The model also provides an important methodology to determine latent groups in data. There are several pieces of research to extend the model based on the asymmetrical and/or with outlier datasets with some known computational limitations that have been examined in frequentist cases. In this paper, an MFA model with a rich and flexible class of skew normal (unrestricted) generalized hyperbolic (called SUNGH) distributions along with a Bayesian structure with several computational benefits have been introduced. The SUNGH family provides considerable flexibility to model skewness in different directions as well as allowing for heavy tailed data. There are several desirable properties in the structure of the SUNGH family, including, an analytically flexible density which leads to easing up the computation applied for the estimation of parameters. Considering factor analysis models, the SUNGH family also allows for skewness and heavy tails for both the error component and factor scores. In the present study, the advantages of using this family of distributions have been discussed and the suitable efficiency of the introduced MFA model using real data examples and simulation has been demonstrated.
translated by 谷歌翻译
基于异常的入侵检测系统(IDS)一直是一个热门研究主题,因为它具有检测新威胁的能力,而不仅仅是记忆的签名威胁基于签名的ID的威胁。尤其是在增加了增加黑客工具数量并增加攻击影响的高级技术之后。任何基于异常的模型的问题是其高阳性率。高阳性速率是为什么在实践中通常不使用异常ID的原因。因为基于异常的模型将看不见的模式分类为一种正常但不包括在培训数据集中的威胁。这种类型的问题称为模型无法概括的过度拟合。通过拥有包括所有可能正常情况的大型培训数据集来优化基于异常的模型可能是一个最佳解决方案,但不能在实践中应用。尽管我们可以增加培训样本的数量以包括更多正常情况,但我们仍然需要一个具有更多概括能力的模型。在本研究论文中,我们建议应用深层模型,而不是传统模型,因为它具有更大的概括能力。因此,我们将通过使用大数据和深层模型获得较少的假阳性。我们通过降低假阳性速率在优化基于异常ID的ID中进行了机器学习和深度学习算法进行比较。我们在NSL-KDD基准测试中进行了一个实验,并将我们的结果与IDS优化中传统学习中使用最佳的分类器之一进行了比较。该实验显示,通过使用深度学习而不是传统学习,假阳性降低了10%。
translated by 谷歌翻译
随着信息技术在所有生命领域中的日益增长的使用,黑客攻击变得比以往任何时候都变得更加有效。同样,随着技术的发展,攻击数字每隔几个月就会成倍增长,并变得更加复杂,因此传统ID效率低下。本文提出了一种解决方案,不仅检测具有更高检测率的新威胁和比已经使用的ID更低的假阳性,而且还可以检测集体和上下文安全攻击。我们通过使用网络聊天机器人(一个深度的复发神经网络:apache Spark框架上的长期短期内存(LSTM))来实现这些结果异常。我们建议合并语言处理,上下文分析,分布式深度学习,大数据,流量分析的异常检测的概念。我们提出了一个模型,该模型描述了网络在其上下文中从数百万数据包中的序列中抽象正常行为,并将它们实时分析以检测点,集体和上下文异常。实验是在MAWI数据集上进行的,它显示出比签名ID的检测率更好,而且比传统异常ID更好。该实验显示较低的假阳性,较高的检测率和更好的点异常检测。至于有上下文和集体异常检测的证明,我们讨论了我们的主张和假设背后的原因。但是,由于硬件限制,该实验是在数据集的随机小子集上进行的,因此我们分享了实验和未来的愿景思想,因为我们希望将来的其他感兴趣的研究人员将来能够充分证明,这些研究人员拥有比我们的硬件基础架构更好的研究人员。
translated by 谷歌翻译